Database : A_GenesecL23Sep04 : * 

1: geneseqpl980s : * 

2: geneseqpl990s : * 

3: geneseqp2000s : * 

4: geneseqp2001s : * 

5: geneseqp2002s : * 

6: geneseqp2003as : * 

7 : geneseqp2003bs : * 

8: geneseqp2004s : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than, or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result Query 

No. Score Match Length DB ID 



Description 



1 


2794 


100 


.0 


497 


3 


AAY93750 


Aay93750 


Amino aci 


2 


439.5 


15 


.7 


174 


6 


AAE30346 


Aae30346 


Perna can 


3 


439.5 


15 


.7 


175 


6 


AAE30347 


Aae30347 


Crassostr 


4 


260 


9 


.3 


1529 


2 


AAR97985 


Aar97985 


CORK pota 


5 


217 


7 


8 


351 


2 


AAR24393 


Aar243 93 


Sequence 


6 


178 


6 


4 


339 


6 


ADA35264 


Ada35264 


Acinetoba 


7 


173.5 


6 


2 


244 


2 


AAR67409 


Aar67409 


Rat super 


8 


_ 173.5 


6 


2_ 


244 


5 


AAM5247_6 


Aam52476 


Superoxid 


9 


173.5 


6 


2 


244 


7 


ADD48518 


Add4851B 


Rat Prote 


10 


172.5 


6, 


2 


221 


2 


AAR27934 


Aar27934 


GAG fusio 



Database : Issued_Patents_AA: * 

1 : /cgn2_6/ptodata/l/ iaa/5A_C0MB.pep : * 

2 : /cgn2_6/ptodata/l/ iaa/5B_C0MB.pep : * 

3 ; /cgn2_6/ptodata/l/iaa/6A_COMB .pep : * 

4 : /cgn2_6/ptodata/l/iaa/6B_COMB.pep: * 

5 : /cgn2_6/ptodata/l/iaa/PCTUS_COMB.pep: * 

6 : /cgn2_6/ptodata/l/iaa/backf ilesl.pep: * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result 
No. 



Score 



Query 

Match Length DB 



ID 



Description 



1 


178 


6 


.4 


339 


4 


US-09-328 


-352-6551 


Sequence 


6551, Ap 


2 


173 . 5 


6 


.2 


244 


3 


US-OB-679 


-493A-188 


Sequence 


188, App 


3 


168 


6 


.0 


150 


2 


US-08-722 


-050-9 


Sequence 


9, Appli 


4 


168 


6 


.0 


150 


4 


US-09-883 


-985-9 


Sequence 


9, Appli 


5 


167 


6 


.0 


154 


3 


US-08-679 


-493A-211 


Sequence 


211, App 


6 


166 


5 


.9 


151 


2 


US-08-722 


-050-10 


Sequence 


10, Appl 


7 


166 


5 


.9 


151 


4 


US-09-883 


-985-10 


Sequence 


10, Appl 


8 


165.5 


5 


.9 


152 


2 


US-08-722 


-050-12 


Sequence 


12, Appl 


9 


165.5 


5 


.9 


152 


4 


US-09-883 


-985-12 


Sequence 


12, Appl 


10 


164.5 


5 


.9_ 


153 


3 


US-08-679_ 


-493A-207 


- Sequence 


207, App 


11 


164 


5 


.9 


151 


3 


US-08-679 


-493A-191 


Sequence 


191, App 


12 


163.5 


5 


.9 


153 


3 


US-08-679 


-493A-201 


Sequence 


201, App 


13 


161.5 


5 


8 


153 


3 


US-08-679 


-493A-202 


Sequence 


202, App 


14 


160.5 


5 


7 


152 


6 


5171680-3 




Patent No. 


5171680 


15 


160 


5. 


7 


1099 


4 


US-09-881 


-654-4 


Sequence 


4, Appli 


16 


160 


5. 


7 


1099 


4 


US-10-637 


-323-4 


Sequence 


4, Appli 


17 


159.5 


5. 


7 


699 


4 


US-09-538 


-092-995 


Sequence 


995, App 


18 


159 


5. 


7 


166 


3 


US-08-679 


-493A-209 


Sequence 


209, App 



Database 



Published_Applications_AA: * 



1 : / cgn2_6 /p toda t a/ 1 /pubpaa/US 0 7_PUBC0MB . pep : * 

2 : /cgn2_6/ptodata/l/pubpaa/PCT_NEW_PUB.pep: * 

3 : /cgn2_6/ptodata/l/pubpaa/US0 6_NEW_PUB.pep:* 

4 : /cgn2_6/ptodata/l/pubpaa/US06_PUBCOMB.pep:* 

5 : /cg]i2_6/ptodata/l/pubpaa/US07_NEW_PUB.pep;* 

6 : /cgn2_6/ptodata/l/pubpaa/PCTUS_PUBC0MB .pep : * 

7 : /cgn2_6/ptodata/l/pubpaa/US0 8_NEW_PUB.pep:* 

8 : /cgn2_6/ptodata/l/pubpaa/US08_PUBCOMB.pep:* 

9 : /cgn2_6/ptodata/l/pubpaa/US09A_PUBCOMB .pep : * 
10 : /cgn2_6/ptodata/l/pubpaa/US09B_PUBCOMB .pep : * 
11 : /cgn2_6/ptodata/l/pubpaa/US09C_PUBCOMB .pep : * 
12: /cgn2_6/ptodata/l/pubpaa/US09_NEW_PUB.pep: * 
13 : /cgn2_6/ptodata/l/pubpaa/US10A_PUBCOMB.pep:* 
14 : /cgn2_6/ptodata/l/pubpaa/US10B_PUBCOMB .pep : * 
15 ; /cgn2_6/ptodata/l/pubpaa/US10C_PUBCOMB . pep : * 
16 : /cgri2_6/ptodata/l/pubpaa/US10D_PUBCOMB .pep : * 
17 : /cgn2_6/ptodata/l/pubpaa/US10_NEW_PUB.pep: * 
18 : /cgn2_6/ptodata/l/pubpaa/USll_NEW_PUB.pep: * 
19: /cgn2_6/ptodata/l/pubpaa/US60_NEW_PUB.pep: * 
20: /cgn2_6/ptodata/l/pubpaa/US60_PUBCOMB.pep: * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result 
No. 



% 

Query 

Score Match Length DB ID 



Description 



1 
2 
3 
4 
5 
6 
7 
8 
9 



170 
170 
170 
170 
170 
170 
170 
170 
170 



6.1 
6.1 
6.1 
6.1 
6.1 
6.1 
6.1 
6.1 
6.1 



152 
153 
153 
153 
153 
153 
153 
153 
153 



17 
15 
15 
15 
15 
IS 
15 
15 
15 



US-10-425-115-233754 

US-10-425-114-48136 

US-10-425-114-52073 

US-10-425-114-52i43 

US-10-425-11.4-.5910.6 

US-10-425-114-61368 

US-10-425-114-62898 

US-10-425-114-66160 

US-10-425-114-72460 



Sequence 233754, 
Sequence 48136, A 
Sequence 52073, A 
Sequence 52143, A 
Sequence 5-9 10 6-, A 
Sec[uence 61368, A 
Sequence 62898, A 
Sequence 66160, A 
Sequence 72460, A 



Database : PIR 79:* 



1: 


pin : * 


2 : 


pir2 :* 


3 : 


pir3 : * 


4 : 


pir4 : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 



Result 
No. 


Score 


Query 
Match 


Length DB 


ID 


Description 


1 


213 


7.6 


351 


1 


KGZQHL 


histidine-rich gly 


2 


204,5 


7.3 


735 


2 


T45059 


hypothetical prote 


3 


178 


6.4 


152 


2 


JW0084 


superoxide dismuta 


4 


178 


6.4 


852 


2 


A34373 


histidine-rich cal 


5 


174.5 


6.2 


251 


2 


S52859 


superoxide dismuta 


6 


173.5 


6.2 


152 


2 


T06570 


superoxide dismuta 


7 


173.5 


6.2 


244 


2 


A49097 


superoxide dismuta 


B 


173 


6.2 


1840 


2 


T29091 


transitin - chicke 


9 


168 


6.0 


151 


2 


A29077 


superoxide dismuta 


10 


167 


6.0 


154 


1 


DSBYC 


superoxide dismuta 


11 


164.5 


5.9 


154 


1 


DSHOCZ 


superoxide dismuta 


12 


164 


5.9 


152 


2 


S07007 


superoxide dismuta 


13 


163 


5.8 


152 


2 


S22508 


superoxide dismuta 


14 


163 


5.8 


152 


2 


S72235 


superoxide dismuta 



Database : UniProt_02:* 

1 : uniprot_sprot : * 
2 ; uniprot_trembl : * 

Pred, No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 



No. 


Score 


Match Length DB 


ID 


Description 


1 


2790 


99.9 


517 


2 


Q9BKB9 


Q9bkb9 


perna canal 


2 


439.5 


15 .7 


174 


2 


Q86FW9 


Q86fw9 


crassostrea 


3 


221 


7.9 


294 


2 


Q7QDP9 


Q7qdp9 


anopheles g 


4 


213 


7.6 


351 


1 


HRPX_PLALO 


P04929 


Plasmodium 


5 


204 .5 


7.3 


735 


2 


Q9NES7 


Q9nes7 


caenorhabdi 


6 


196.5 


7.0 


2245 


2 


Q8IAM6 


Q8iam6 


Plasmodium 


7 


191 


6.8 


722 


2 


Q7YS21 


Q7ys21 


macaca fasc 


8 


178.5 


6.4 


726 


2 


Q9QZV4 


Q9qzv4 


mus musculu 


9 


178 


6.4 


152 


1 


SODC^SOYBN 


Q7ralr5 


glycine max 


10 


178 


6.4 


852 


1 


SRCH_RABIT 


P16230 


oryctolagus 


11 


177 


6.3 


738 


2 


Q9WVE4 


Q9wve4 


mus musculu 


12 


175 


6.3 . 


151 


1 


SODC_HALRO 


P81926 


halocynthia 


13 


174 .5 


6.2 


251 


2 


Q64466 


Q64466 


mus musculu 


14 


174 


6.2 


152 


2_ 


Q9ZNQ4 


- Q9znq4 


cicer ariet 


15 


173 .5 


6.2 


151 


1 


SODC_PEA 


Q02610 


pisum sativ 



RESULT 1 
Q9BKB9 

ID Q9BKB9 • PRELIMINARY; PRT; 517 AA. 

AC Q9BKB9; 

DT Ol-JUN-2001 (TrEMBLrel. 17, Created) 

DT Ol-JUN-2001 (TrEMBLrel. 17, Last sequence update) 

DT ai-MAR-2004 (TrEMBLrel. 26, Last annotation update) 

DE Pernin precursor. 

OS Perna canaliculus (greenshell mussel) . 

OC Eukaryota; Metazoa; Mollusca; Bivalvia; Pteriomorphia; Mytiloida; 

OC Mytiloidea; Mytilidae; Perna. 

OX NCB I_Tax ID= 38949/ 
RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=21186417; PubMed=11290459; 

RA Scott i P.D., Dearing S.C., Greenwood D.R., Newcomb R.D./ 

RT "Pernin: a novel self -aggregating haemolymph protein from the New 

RT Zealand green-lipped mussel Perna canaliculus (bivalvia: mytilidae)."; 

RL Comp. Biochem. Physiol, B, Biochem. Mol. Biol. 128:767-779(2001). 

DR EMBL; AF273766; AAK209S2.1; 

DR HSSP; P00445; IFIG. 

DR GO; GO:0004785; F:copper, zinc superoxide dismutase activity; JEA. 

DR GO; GO: 0046872; F: metal ion binding; lEA. i 

DR GO; GO: 0006801; P: superoxide metabolism; lEA. 

DR InterPro; IPR001424; SOD_CU_ZN. 

DR Pfam; PF00080; Sod_Cu; 3. 

DR PRINTS; PR00068; CUZNDISMTASE . 

KW " Signal. " " " " 

FT SIGNAL 1 * 20 

FT CHAIN 21 517 pernin. 

SQ SEQUENCE 517 AA; 57222 MW; 87B8FBFPE8555 OlE CRC64; 

Query Match 99.9%; Score 2790; DB 2; Length 517; 

Best Local Similarity 99.8%; Pred. No. 1.6e-198; 

Matches 496; Conservative 1; Mismatches 0; Indels 0; Gaps 0 
1 EIGEQCNDGQNKDDHHDDHHDDHHDDHDDDDETMHYAQCEMEPNPHMASSLHHHVHGSIEL 60 

IMIIMIIMIIMIIIIIIMMIIMIIIIIMIIIMMIIIIMIIIMIIIIII 

21 DGEQCNDGQNKDDHHDDHHDDHHDDHDDDDETMHYAQCEMEPNPHMASSLHHHVHGSIEL 80 
^1 SQKGHGAVYLELHLVGFNTSEDHDDHHHGLHLHMLGDMSAGCDSIGELYNAHPEKHADPG 120 

, II^IMIIIIIMMIIMIIMIMIIIIIIIIIIIIIIMMIIIIIIIIIIIMIII 

81 SQQGHGAVYLELHLVGFNTSEDHDDHHHGLHLHMLGDMSAGCDSIGELYNAHPEKHADPG 140 
Qy 121 DLGDLVDDDRGWNEVHHYAWLDIDGTAPNTEALIGHSMTILQGSHTDADTPASRIACCV 180 

IIIIIIIIIMMMNIIMIIIIMMIIIIIIIIIIIIIMIIIIIMIIIMMM 

141 DLGDLVDDDRGWNEVHHYAWLDIDGTAPNTEALIGHSMTILQGSHTDADTPASRIACCV 200 
Qy 181 IGHGKARPETAAALHHELEEDKTEHYAHCDVRSNTHQPKALHHHVHGTIDFKQVGYGDLE 240 

, iMIIIIIIIMIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIMIIIMIIIMIII 

Db 201 IGHGKARPETAAALHHELEEDKTEHYAHCDVRSNTHQPKALHHHVHGTIDFKQVGYGDLE 260 

Qy 241 VSYHLEGFNVSDDHKDHLHDVQIYANGDLTSGCDNLGAKYDPHEDYHSELGDLGDIHDDD 300 

^. INIIIIMIMIIIIIItMIIIIIMIMIIMII.IIIIIIIIIIII!IIIIIIMII 

Db 261 VSYHLEGFNVSDDHKDHLHDVQIYANGDLTSGCDNLGAKYDPHEDYHSELGDLGDIHDDD 320 



Qy 3 01 



HGWNESHRYSWINIFGDDSVLGRSIAIHQRDHLHKSAKIACCVIGRGQSHPEIVHRAKC 



360 



Db 321 HGVWESHRYSWINIFGDDSVLGRsillHQRra^ 380 

Qy 361 WRPNTESTGLHHHVSGSITFEQTPGGSTHMTADLKGFNVSEDLSHHRHGVQLHEWGDMS 42 0 

Db: 3 81 WRPNTESTGLHHHVSGSITFEQTPGGSTHMTADLKGFNVSEDLSHHRHGVQLHEWGDMS 440 

Qy 421 HGCHSLGRMYHGHDDAHDPKRPGDLGDVIDDSHGIVHSTRTFDHLNVEDLNARSLVIMQG 480 

Db 441 HGCHSLGRMYHGHDDAHDPKRPGDLGDVIDDSHGIVHSTRTFDHLNVEDIiNARSLVIMQG 500 

Qy 4 81 GHEVESERVACCVIGRA 4 97 

Db 501 GHEVESERVACCVIGRA 517 



Database : EST:* 



1 

J. 




-* 


2 


: gb_est2 




3 


: gb_htc : ' 




4 


gb_est3 


ir 


5 


gb_est4 


* 


6 


gb_est5 




7 


gb_est6 


* 


8 


gb_gssl 


* 


9: 


gb_gss2 


* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SXJMMARIES 

% / 
Result Query 



No. 


Score 


Match Length DB 


ID 


Description 


1 


155.8 


10.4 


688 


6 


CD649186 


CD649186 


AUF_104_N 


2 


149.4 


10,0 . 


704 


6 


CD648295 


CD648295 


AUF_102_G 


3 


147.8 


9.9 


682 


6 


CD648076 


CD648076 


AUF_101_M 


4 


147.8 


9.9 


697 


6 


CD647088 


CD647088 


AUF_107_A 


5 


147.8 


9.9 


697 


6 


CD647705 


CD647705 


AUF_108_L 


6 


147.8 


9.9 


706 


6 


CD649879 


CD64 9879' 


CvGil0058 


7 


14 7 


9.9 


698 


6 


CD65042 8 


CD65 0428 


CvGil0113 


8 " 


146.2 


9.8 


696 


6 


CD648647 


CD64 8647 


AUF_103_F 


9 


146.2 


9.8 


699 


6 


CD648443 


CD648443 


AUF_102_M 


10 


146.2 


9.8 


720 


6 


CD648998 


CD648998 


AUF_104_E 


11 


146.2 


9.8 


725 


6 


CD649188 


CD649188 


AUF_104_N 


12 


146.2 


9.8 


734 


6 


CD648621 


CD64 8621 


AUF_103_E. 


13 


145.4 


9.8 


696 


6 


CD648155 


CD648155 


AUF_101_P 


14 


145.4 


9.8 


713 


6 


CD649071 


CD649071 


AUF_104_I 


15 


144 .6 


9.7 


698 


6 


CD648763 


CD648763 


AUF 103 K 



Scoring table: OLIGO_NUC 

Gapop 60.0 , Gapext 60.0 

Searched: 32822875 seqs, 18219865908 residues 

Word size : 0 

Total number of hits satisfying chosen parameters: 65645750 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Listing first 45 summaries 

Database : EST:* 

1: gb_estl:* 
2: gb_est2:* 
3 : gb_htc : * 
4 : gb_est3 : * 
5 : gb_est4 : * 
6: gb_est5:* 
7: gb_est6:* 
8: gb_gssl:* 
9: gb_gss2:* 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


26 


1 


.7 


766 


6 


BH948315 


BH948315 


Obu82g07 . 


2 


26 


1 


7 


ll.Ql 


9 


CNS.00HD3 


AL073332 


Drosophil 


3 


25 


1 


7 


529 


5 


BQ118156 


BQ118156 


EST603732 


4 


.25 


1 


7 


756 


6 


CB942058 


CB942058 


AGENCOURT 


5 


25 


1 


7 


946 


6 


CF265550 


CF265550 


AGENCOURT 


6 


23 


• 1 


5 


541 


1 


AI724181 


AI724181 


RHIZ1_8_B 


7 


23 


1, 


5 


602 


5 


BW326909 


BW326909 


BW326909 



Database : GenEmbl : * 

1: gb_ba:* 
2 : gb_htg : * 
3 : gb_in ; * 

4 : gb_om : * 

5 : gb_ov : * 
^ '• 9t>_pat : * 
7 : . gb_j5h : * 
8: gb_J>l:* . 
9 : gb Jpr : * 
10: gb_ro:* 
11: gb_sts:* 
12: gb_sy:* 
13: gb_un:* 
14 : gb_vi : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result 



% 

Query 





No. 


Score 


Match 


Length DB 


ID 




1 


1490,6 


100.0 


1491 


6 


BD268169 




2 


1490.6 


100.0 


1611 


6 


BD268170 




3 


1484.2 


99 .5 


1700 


3 


AF273766 




4 


128,4 


8.6 


603 


3 


Ay256853 


c 


5 


88.4 


5.9 


115758 


9 


AC104634 




6 


86 


5.8 


110000 


2 


PFMAL13_24 


c 


7 


74.8 


5.0 


164347 


9 


AC104805 




8 


74.8 


5.0 


186278 


9 


AC079176 


c 


9 


74 


5.0 


75111 


5 


BX276082 



Description 



BD268169 Serine pr 
_BD268170 Serine pr 
AF273766 Perna can 
AY256853 Crassostr 
AC104634 Homo sapi 
Continuation (25 o 
AC104805 Homo sapi 
AC079176 Homo sapi 
BX276082 Zebrafish 



RESULT 1 
BD268169 
LOCUS 

DEFINITION 
ACCESSION 
VERSION 
KEYWORDS 
SOURCE 
ORGANISM 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



ORIGIN 



BD268169 1491 bp DNA linear PAT 17-JUL-2003 

Serine protease inhibiors. 

BD268169. 

BD26B169.1 01:33077937 

JP 2002534063-A/l. 

unidentified 

unidentified 

unclassified. 

1 (bases 1 to 1491) 

Scotti,P.D., Dearing, S.C. , Greenwood, D.R. and Newcomb,R.D. 

Serine protease inhibiors 

Patent: JP 2002534063-A 1 15-OCT-2002; 

THE HORTICULTURE AND FOOD RESEARCH INSTITUTE OF NEW ZEALAND LTD 

OS Shellfish 

PN JP 2002534063-A/l 

PD 15-OCT-2002 

PF 23-DEC-1999 JP 2000591076 

PR 23-DEC-1998 N2 333568 , 23 -JUL-1999 NZ 336906 PI 

PAUL DOUGLAS SCOTTI, SALLY CAROLINE DEARING, DAVID ROGER PI 
GREENWOOD, 

PI RICHARD DAVID NEWCOMB 

PC C12N15/09,A23L1/305,A61K38/00;A61P7/04,A61P43/00,C07K1/14, PC 
C07K14/435, 

PC C12N1/15, C12N1/19, C12N1/21, C12N5/10, C12N9/99// (C12N9/99, C12R1 : . 
PC 91) , 

PC C12N15/00,C12N5/00,A61K37/02 
CC Serine protease inhibiors 

Key Location/ Qualifiers 

FT source 1. .1491 

FT /organisms ' Shellfish ' . 

Location/Qualifiers 

1. .1491 

/organis[n=*' unidentified" 
/mo l^typ.e= "genomic DNA" 
/db xref="taxon: 32644" 



Query Match 100,0%; Score 1490.6; 

Best Local Similarity 100.0%; Pred. No. 0; 
Matches 1491; Conservative 0; Mismatches 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



DB 6; Length 14 91; 
0; Indels 0; Gaps 



0; 



1 GAYGGGGAGCAGTGTAACGATGGGCAGAACAAAGATGACCACCATGACGACCACCACGAT 

llliMIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMlllllliiiiiiiii 

1 GAYGGGGAGCAGTGTAACGATGGGCAGAACAAAGATGACCACCATGACGACCACCACGAT 



60 



60 



61 



120 



GATCACCATGACGACCATGATGATGATGATGAAACAATGCACTATGCCCAGTGTGAAATG 

llllllillMMIIIIIIMIIIIIIIIIMMIIMIIIMlMIIMIMMIIMI 

61 GATCACCATGACGACCATGATGATGATGATGAAACAATGCACTATGCCCAGTGTGAAATG 120 



180 



121 GAACCAAACCCTCATATGGCTAGCAGCCTTCACCACCATGTCCATGGCAGCATAGAGTTG 

IIIIIMMMMMMIIIIIMIIIIIMIIMIMIIIIMMIMIIMMIIIII 

121 GAACCAAACCCTCATATGGCTAGCAGCCTTCACCACCATGTCCATGGCAGCATAGAGTTG 180 



181 TCACAGAAGGGTCATGGAGCTGTTTATCTAGAACTTCATCTTGTCGGATTCAACACAAGT 

nil iMIIIIIIIIIIIIIIIIIMIIIIIIllliilliiiiiiiiiiiiiii 



240 



Db 



181 TCACAGAAGGGTCATGGAGCTGTTTATCTAGAACTTCATCTTGTCGGATTCAACACAAGT 240 



Qy 241 GAAGACCATGACGACCACCATCATGGACTTCATCTGCACATGCTTGGTGACATGTCAGCA 300 

lllillllllMIIMIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIMIIIIIIIIM 

Db 241 GAAGACCATGACGACCACCATCATGGACTTCATCTGCACATGCTTGGTGACATGTCAGCA 3 00 

Qy 301 GGTTGTGATTCTATTGGCGAACTGTACAATGCTCACCCAGAAAAACATGCTGACCCTGGT 360 

IIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMII 

Db 301 GGTTGTGATTCTATTGGCGAACTGTACAATGCTCACCCAGAAAAACATGCTGACCCTGGT 3 60 

Qy 361 GACCTCGGTGACCTGGTTGACGATGATAGGGGCGTGGTTAATGAAGTTCATCATTATGCT 420 

. Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

Db 361 GACCTCGGTGACCTGGTTGACGATGATAGGGGCGTGGTTAATGAAGTTCATCATTATGCT 420 

Qy 421 TGGTTGGACATTGATGGTACAGCACCAAACACCGAAGCTCTCATTGGACACTCAATGACT 4 80 

lilMlllilllMIIIIIIIIIIIMIIIIIIillllllllllllllllllllllllll 

Db 421 TGGTTGGACATTGATGGTACAGCACCAAACACCGAAGCTCTCATTGGACACTCAATGACT 4 80 

Qy 4 81 ATTTTACAAGGGAGTCACACCGATGCTGATACCCCAGCCAGTAGAATCGCCTGTTGTGTT 540 

IIIIMIMIIIIIIIMIIIIIIIIIIIIIIIIIMMIIIMIIIIMIIIIIIIIII 

Db 481 ATTTTACAAGGGAGTCACACCGATGCTGATACCCCAGCCAGTAGAATCGCCTGTTGTGTT 540 

Qy 541 ATTGGTCATGGAAAAGCTCGCCCAGAAACAGCAGCTGCTCTACATCACGAGCTAGAGGAA 600 

IMIIIIIIIMMIMIMIilllMMMIIIIMIIMIIMMIIIIIIMIMII 

Db 541 ATTGGTCATGGAAAAGCTCGCCCAGAAACAGCAGCTGCTCTACATCACGAGCTAGAGGAA 600 

Qy 601 GATAAAACTGAGCATTATGCCCATTGTGACGTAAGATCTAATACACACCAACCAAAGGCT 660 

IMIIIIIIIIMIIIIIIIIIIIIIIIIIIMIIIINIIIIIIIIIIIIIIIIIIIII 

Db 601 GATAAAACTGAGCATTATGCCCATTGTGACGTAAGATCTAATACACACCAACCAAAGGCT 660 

Qy 661 CTTCATCATCATGTCCACGGAACCATCGATTTCAAACAAGTTGGTTATGGTGACCTTGAA 720 

lillllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

Db 661 CTTCATCATCATGTCCACGGAACCATCGATTTCAAACAAGTTGGTTATGGTGACCTTGAA 720 

Qy 721 GTGTCCTACCATTTAGAGGGATTTAATGTAAGTGATGACCACAAAGATCATCTCCATGAC 780 

IIIU.IIUJIIIII.I!IM.||.|.M 

Db 721 GTGTCCTACCATTTAGAGGGATTTAATGTAAGTGATGACCACAAAGATCATCTCCATGAC 780 

Qy 781 GTACAGATCTACGCCAACGGTGACCTGACCAGTGGATGTGATAACCTCGGTGCTAAATAT 840 

I i 1 1 1 i I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 i 

Db 781 GTACAGATCTACGCCAACGGTGACCTGACCAGTGGATGTGATAACCTCGGTGCTAAATAT 84 0 

Qy 841 GATCCTCATGAAGATTACCACAGTGAGTTGGGTGATCTAGGAGATATTCACGATGATGAC 900 

IMIIIMIIIIIMIIMIIIIIIIIIIIIMIMIMIIIMIIIIIIIIIIIMIII 

Db 841 GATCCTCATGAAGATTACCACAGTGAGTTGGGTGATCTAGGAGATATTCACGATGATGAC 900 

901 CATGGCGTTGTCAATGAAAGCCACAGATATTCCTGGATCAATATCTTCGGTGATGACAGT 960 

IIMIIIIIIMillllilllllllllllllllMMIIIIIIIIMIIIIIIIIIIIM 

Db 901 CATGGCGTTGTCAATGAAAGCCACAGATATTCCTGGATCAATATCTTCGGTGATGACAGT 960 

Qy 961 GTCCTGGGACGTTCTATTGCCATTCACCAAAGAGACCATCTTCATAAAAGTGCCAAAATT 1020 

illllMIIIMMMIIIIMMIIIIIIMIMIMIIIIIIMMI'IIIIIIIIMI 

961 GTCCTGGGACGTTCTATTGCCATTCACCAAAGAGACCATCTTCATAAAAGTGCCAAAATT 1020 
Qy 1021 GCCTGTTGTGTCATAGGACGTGGACAGAGCCATCCAGAAATTGTTCACAGAGCTAAATGT 1080 

1 1 1 1 1 1 M M I M I M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 j 1 1 1 1 1 1 1 1 1 

Db 1021 GCCTGTTGTGTCATAGGACGTGGACAGAGCCATCCAGAAATTGTTCACAGAGCTAAATGT 1080 



Qy 

Db 



1081 GTTGTCAGACCTAATACAGAATOTACTGGTTTACATCACCATGTCTCTGGTTCTATAACA 1140 

MIIMIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMII - 

1081 GTTGTCAGACCTAATACAGAATCTACTGGTTTACATCACCATGTCTCTGGTTCTATAACA 1140 



Qy 


1141 


TTCGAACAGACCCCTGGAGGATCAACACATATGACGGCTGATCTCAAAGGATTTAACGTT 


1200 






1 1 11 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 r 1 t 1 1 r t 1 1 

M 1 M M 1 1 1 1 1 1 M M 1 M M M M M 1 1 M M M M M 1 1 M M 1 M M 




UD 


1141 


TTCGAACAGACCCCTGGAGGATCAACACATATGACGGCTGATCTCAAAGGATTTAACGTT 


1200 


Qy 


1201 


AGTGAGGACTTGTCACATCATCGTCATGGTGTGCAGCTCCATGAATGGGGAGATATGTCC 


1260 






llllllllllllllllllllllllllilllllllllllllllllllllllllllllllll 




Db 


1201 


AGTGAGGACTTGTCACATCATCGTCATGGTGTGCAGCTCCATGAATGGGGAGATATGTCC. 


1260 


Qy 


1261 


CATGGCTGTCACTCCTTAGGCAGAATGTACCATGGTCATGATGATGCTCATGACCCCAAA 


1320 


Db 




IIIJIIMIIIIMMMIIIIIIIIIIMMMIIMIIIIMIIIIMMIIIIIIM 




1261 


CATGGCTGTCACTCCTTAGGCAGAATGTACCATGGTCATGATGATGCTCATGACCCCAAA 


1320 


Qy 


1321 


AGACCTGGTGACCTTGGTGATGTTATAGATGATTCCCATGGCATCGTTCATTCAACTAGA 


1380 






lllillMIIIIMMIIIIIIIIIIIIIIIIIMMIIIIIIIIIMIMIIIIIIIII 




Db 


1321 


AGACCTGGTGACCTTGGTGATGTTATAGATGATTCCCATGGCATCGTTCATTCAACTAGA 


1380 



Qy 1381 ACCTTTGATCATCTTAATGTTGAAGATCTTAACGCACGTTCCCTTGTGATTATGCAGGGC 144 0 

rillMI III I IIIIIIIIIMIIMIIMIIIIIMMI IIIIIIII III II II Mill 

Db 1381 ACCTTTGATCATCTTAATGTTGAAGATCTTAACGCACGTTCCCTTGTGATTATGCAGGGC 144 0 

Qy 1441 GGACATGAGGTCGAGAGTGAGAGGGTTGCTTGCTGTGTTATAGGACGGGCA 1491 

IIMIIIIIIil-lllllllllllllllllllllllll-llllllllllllll 

Db . 1441 GGACATGAGGTCGAGAGTGAGAGGGTTGCTTGCTGTGTTATAGGACGGGCA 1491 



RESULT 2 
BD268170 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



DNA 



linear PAT 17-JUL-2003 



BD268170 1611 bp 

Serine protease inhibiors. 
BD268170 

BD268.170.1 01:33077938 
JP 2002534063-A/2. 
unidentified 
unidentified 
unclassified. 
1 (bases 1 to 1611) 
Scotti,P.D., Dearing,S.C. 
Serine protease inhibiors 
Patent: JP 2002534063-A 2 15-OCT-2002; 

THE HORTICULTURE AND FOOD RESEARCH INSTITUTE OF NEW ZEALAND LTD 
OS 
PN 
PD 
PF 
PR 



Greenwood, D.R. and Newcomb,R.D. 



Shellfish 
JP 2002534063-A/2 
15-OCT-2002 

23-DEC-1999 JP 2000591076 

23-DEC-1998 NZ 333568 , 23 -JUL-1999 NZ 



336906 PI 



PAUL DOUGLAS SCOTTI, SALLY CAROLINE DEARING, DAVID ROGER PI 
GREENWOOD, 

PI RICHARD DAVID NEWCOMB 

PC C12N15/09,A23L1/305,A61K38/00,A61P7/04,A61P43/00,C07K1/14, PC 
C07K14/435, 

PC C12Nl/l5,C12Nl/l9,C12Nl/21,C12N5/l0,C12N9/99//{C12N9/99,C12Rl: 
PC 91), 



PC C12N15/00,C12N5/00,A61K37/02 
CC Serine protease inhibiors 
FH Key Location/Qualifiers 
FT source 1. .1672 

FT /organism=*Shellf ish' . 

FEATURES Location/Qualifiers 
source l. .1611 

/organisms "unidentified" 

/mo 1_ type = "genomic DNA" 

/db__xref = " taxon : 3 2 644 " 

ORIGIN 

Query Match 100.0%; Score 1490.6; DB 6; Length 1611; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1491; Conservative 0; Mismatches 0; Indels 0; Gaps 0 
GAYGGGGAGCAGTGTAACGATGGGCAGAACAAAGATGACCACCATGACGACCACCACGAT 6 0 

MMIMIMI Mill 1 1111 Mill IIIIIMMIMMIMMIIIIMII INI III 

GAYGGGGAGCAGTGTAACGATGGGCAGAACAAAGATGACCACCATGACGACCACCACGAT 6 0 
GATCACCATGACGACCATGATGATGATGATGAAACAATGCACTATGCCCAGTGTGAAATG 120 

MINI INI III mill II II I III I II mill III I mil MM Nil Mill II 

GATCACCATGACGACCATGATGATGATGATGAAACAATGCACTATGCCCAGTGTGAAATG 120 
GAACCAAACCCTCATATGGCTAGCAGCCTTCACCACCATGTCCATGGCAGCATAGAGTTG 180 

IMIIIIIIIMIIIIIMIMIIIIIIIIMIIIIIMMIIIMMIIIIIIIIIIII 

GAACCAAACCCTCATATGGCTAGCAGCCTTCACCACCATGTCCATGGCAGCATAGAGTTG 180 
TCACAGAAGGGTCATGGAGCTGTTTATCTAGAACTTCATCTTGTCGGATTCAACACAAGT 240 

IIMIIIIIIIIIIIIMIMMIIIIIIIIIIIIMMIIMMIIIMMIIIIIIII 

TCACAGAAGGGTCATGGAGCTGTTTATCTAGAACTTCATCTTGTCGGATTCAACACAAGT 24 0 
GAAGACCATGACGACCACCATCATGGACTTCATCTGCACATGCTTGGTGACATGTCAGCA 3 00 

immiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiii 

GAAGACCATGACGACCACCATCATGGACTTCATCTGCACATGCTTGGTGACATGTCAGCA 3 00 
GGTTGTGATTCTATTGGCGAACTGTACAATGCTCACCCAGAAAAACATGCTGACCCTGGT 3 60 

IIIIIMIIMIIIMIIIMMIIIIIIIMIMIIMIIIIIIIIIIIIIIIIIIIII 

GGTTGTGATTCTATTGGCGAACTGTACAATGCTCACCCAGAAAAACATGCTGACCCTGGT 3 60 
GACCTCGGTGACCTGGTTGACGATGATAGGGGCGTGGTTAATGAAGTTCATCATTATGCT 420 

iiimiiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

GACCTCGGTGACCTGGTTGACGATGATAGGGGCGTGGTTAATGAAGTTCATCATTATGCT 420 
TGGTTGGACATTGATGGTAC AGCACCAAACACCGAAGCTCTCATTGGAC ACTCAATGACT 480 

IMIIIIMMIMMIIIIIMIIIIIIIMIIIMMIIIMIIIIIIIIIIIIIIII 

TGGTTGGACATTGATGQTACAGCACCAAACACCGAAGCTCTCATTGGACACTCAATGACT 480 
ATTTTACAAGGGAGTCACACCGATGCTGATACCCCAGCCAGTAGAATCGCCTGTTGTGTT 54 0 

IMIIIIIIIIIIIIIIIMIIIIIIIMIIMMMIMIIIIIIIMIIIIIIIIMI 

ATTTTACAAGGGAGTCACACCGATGCTGATACCCCAGCCAGTAGAATCGCCTGTTGTGTT 540 
ATTGGTCATGGAAAAGCTCGCCCAGAAACAGCAGCTGCTCTACATCACGAGCTAGAGGAA 600 

MIMMMjlMMIMMMMIIMMMMIMMIMMMMMMMMMM 

ATTGGTCATGGAAAAGCTCGCCCAGAAACAGCAGCTGCTCTACATCACGAGCTAGAGGAA 600 



Qy 


1 


Db 


1 


■Qy 


61 


Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 


Qy 


421 


Db 


421 


Qy 


481 


Db 


481 


Qy 


541 


Db 


541 



Qy 



601 GATAAAACTGAGCATTATGCCCATTGTGACXSTAAGATCTAATACACACCAACCAAAGGCT 660 



NiiiiiiiiiiiiMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiriiiiiiiii 

601 GATAAAACTGAGCATTATGCCCATTGTGACGTAAGATCTAATACACACCAACCAAAGGCT 660 
661 CTTCATCATCATGTCCACGGAACCATCGATTTCAAACAAGTTGGTTATGGTGACCTTGAA 72 0 

IIIIMIIIIIilllllllllllllllllllllllllllMIIIIIMIIIIIIIIIIII 

661 CTTCATCATCATGTCCACGGAACCATCGATTTCAAACAAGTTGGTTATGGTGACCTTGAA 72 0 
721 GTGTCCTACCATTTAGAGGGATTTAATGTAAGTGATGACCAC AAAG ATCATCTCCATGAC 78 0 

MlillllMIIIIIMIMIIIIIIIIIIIIIIMIIillMIMIIIIIilllllMI 

721 GTCTCCTACCATTTAGAGGGATTTAATGTAAGTGATGACCACAAAGATCATCTCCATGAC 780 
781 GTACAGATCTACGCCAACGGTGACCTGACCAGTGGATGTGATAACCTCGGTGCTAAATAT 840 

lllilllllllllllllllllMIIIIIIIIIIIIIIIIIIIIMIIIMMIIIIIIII 

7 81 GTACAGATCTACGCCAACGGTGACCTGACCAGTGGATGTGATAACCTCGGTGCTAAATAT 840 
841 GATCCTCATGAAGATTACCACAGTGAGTTGGGTGATCTAGGAGATATTCACGATGATGAC 900 

MMMIIIIIIMIIIIIIMMIIIIIIIIIIIIMIIIIIIIIMIIMIIIIIMI 

841 GATCCTCATGAAGATTACCACAGTGAGTTGGGTGATCTAGGAGATATTCACGATGATGAC 900 
901 CATGGCGTTGTCAATGAAAGCCACAGATATTCCTGGATCAATATCTTCGGTGATGACAGT 960 

. IIIIIMIIIIIIIIIIIIIIIIilllllllllillllllllllllllllllllllllll 

901 CATGGCGTTGTCAATGAAAGCCACAGATATTCCTGGATCAATATCTTCGGTGATGACAGT 960 
961 GTCCTGGGACGTTCTATTGCCATTCACCAAAGAGACCATCTTCATAAAAGTGCCAAAATT 1020 

MMIIIIIIIIIMIIIIIIIIIIIIIIMIIIIIIIIMMIIIIIIIIMIMIIM- 

961 GTCCTGGGACGTTCTATTGCCATTCACCAAAGAGACCATCTTCATAAAAGTGCCAAAATT 1020 
1021 GCCTGTTGTGTCATAGGACGTGGACAGAGCCATCCAGAAATTGTTCACAGAGCTAAATGT 1080 

INIIIMIIIIIMIIIIIIIIIIIIIillllllilllllllllllllllMIIIIIII 

1021 GCCTGTTGTGTCATAGGACGTGGACAGAGCCATCCAGAAATTGTTCACAGAGCTAAATGT 1080 
1081 GTTGTCAGACCTAATACAGAATCTACTGGTTTACATCACCATGTCTCTGGTTCTATAACA 1140 

IIIIMIIIIilllMIIIIIIIIIIIIIIIIIMIIIIIIillillllMIIIIIIIM 

1081 GTTGTCAGACCTAATACAGAATCTACTGGTTTACATCACCATGTCTCTGGTTCTATAACA 1140 
1141 TTCGAACAGACCCCTGGAGGATCAACACATATGACGGCTGATCTCAAAGGaTTTttAf^GTT 1^00 

ilMIIIMIIMIIIMIIIIIIMIIIIIMIIIIIMIIIIIMIIIMlilMlil 

1141 TTCGAACAGACCCCTGGAGGATCAACACATATGACGGCTGATCTCAAAGGATTTAACGTT 1200 
1201 AGTGAGGACTTGTCACATCATCGTCATGGTGTGCAGCTCCATGAATGGGGAGATATGTCC 12 60 

IIIIIIMIIIIIIIIIMIII iMIIIIIMIIIIIIIIIMIIIIIIIIIIIi 

1201 AGTGAGGACTTGTCACATCATCGTCATGGTGTGCAGCTCCATGAATGGGGAGATATGTCC 12 60 
1261 CATGGCTGTCACTCCTTAGGCAGAATGTACCATGGTCATGATGATGCTCATGACCCCAAA 13 20 

IIIIMIIIIIIIMIIMIIIIIIilllllMIIIIIIIIIIIIIIIIIIIIIIIIIII 

1261 CATGGCTGTCACTCCTTAGGCAGAATGTACCATGGTCATGATGATGCTCATGACCCCAAA 1320 
1321 AGACCTGGTGACCTTGGTGATGTTATAGATGATTCCCATGGCATCGTTCATTCAACTAGA 13 80 

IIIMIMMIIMIIIIIIIIIIIIMIIIilMIIIIMIIIMIIIIMIMIIIII 

1321 AGACCTGGTGACCTTGGTGATGTTATAGATGATTCCCATGGCATCGTTCATTCAACTAGA 1380 
13 81 ACCTTTGATCATCTTAATGTTGAAGATCTTAACGCACGTTCCCTTGTGATTATGCAGGGC 1440 

IIIMIIIIIIIMIMMIIIIIMIIIIMIIllllllMMiiiiiMIIMIMII 

1381 ACCTTTGATCATCTTAATGTTGAAGATCTTAACGCACGTTCCCTTGTGATTATGCAGGGC 1440 
1441 GGACATGAGGTCGAGAGTGAGAGGGTTGCTTGCTGTGTTATAGGACGGGCA 14 91 

IIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIlMllli 



Db 1441 GGACATGAGGTCGAGAGTGAGAGGGTTGCTTGCTGTGTTATAGGACGGGCA 14 91 



RESULT 3 
AF273766 
LOCUS 

DEFINITION 
ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



AF273766 1700 bp 

Perna canaliculus pernin precursor, 
AF273766 

AF273766.1 GI: 13383377 



mRNA linear INV 
mRNA, complete cds . 



20-MAR-2001 



Perna canaliculus (greenshell mussel) 
Perna canaliculus 

EuJcaryota; Metazoa; Mollusca; Bivalvia; Pteriomorphia; Mytiloida; 
Mytiloidea; Mytilidae; Perna. 

1 (bases 1 to 1700) 

Scotti,P.D., Dearing,S.C, , Greenwood, D.R. and NewcoTnb,R.D. 
Pernin: a novel, self -aggregating haemolytnph protein from the New 
Zealand green- lipped mussel, Perna canaliculus (Bivalvia: 
Mytilidae) 

Comp. Biochem. Physiol. B, Biochem. Mol . Biol. 128 (4), 767-779 
(2001) 
21186417 
11290459 

2 (bases 1 to 1700) 

Scotti,P.D., Dearing,S.C., Greenwood, D.R. and Newcomb,R.D. 
Direct Submission 

Submitted (31-MAY-2000) The Horticulture and ^Fopd Research 
Institute of New Zealand Ltd, 120 Mt. Albert Road, Auckland, New 
Zealand 

Location/Qualifiers 
1. .1700 

/organism^ "Perna canaliculus" 
/ mo l_t yp e = " mRNA *' 
/ db_xre f = " t axon : 3 8 94 9 " 
34. .1587 

/note="ha.emolymph protein.; N- terminus determ.ined by 

microsequencing of pernin: DGEQCNDGQN and HPLC-purif ied 

CNBr and tryptic digest fragments: ASSLHHHVHG; WNEVHH; 

GQSHPEIVH; YHGHDDA; QGGHEVESERVACCVIGRA" 

/codon_start=l 

/evidence=experimental 

/product =^ "pernin precursor" 

/protein_id= " AAK2 0952 . 1 " 

/db_xref ="GI : 13383378 " 

/translation="MKLILLSLWFAALALQVRADGEQCNDGQNKDDHHDDHHDDHHD 
DHDDDDETMHYAQCEMEPNPHMASSLHHHVHGSIELSQQGHGAVYIiELHLVGFNTSED 
HDDHHHGLHLHMLGDMSAGCDSIGELYNAHPEKHADPGDLGDLVDDDRGWNEVHHYA 
WLDIDGTAPNTEALIGHSMTILQGSHTDADTPASRIACCVIGHGKARPETAAALHHEL 
EEDKTEHYAHCDVRSNTHQPKALHHHVHGTIDFKQVGYGDLEVSYHLEGFNVSDDHKD 
HLHDVQIYANGDLTSGCDNLGAKYDPHEDYHSELGDLGDIHDDDHGWNESHRYSWIN 
IFGDDSVLGRSIAIHQRDHLHKSAKIACCVIGRGQSHPEIVHRAKCWRPNTESTGLH 
HHVSGSITFEQTPGGSTHMTADLKGFNVSEDLSHHRHGVQLHEWGDMSHGCHSLGRMY 
HGHDDAHDPKRPGDLGDVIDDSHGIVHSTRTFDHLNVEDLNARSLVIMQGGHEVESER 
VACCVIGRA" 

sig_peptide 34 . .93 

mat_peptide 94. .1584 

/ pr oduc t = " pe r n i n " 



REFERENCE 
AUTHORS 
TITLE 



JOURNAL 

MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



CDS 



polyA_signal 1650. .1655 
ORIGIN 



Query Match 99.5%; Score 14 84.2; DB 3; Length 1700; 

Best Local Similarity 99.7%; Pred. No. 0; 

Matches I48 6; Conservative 1; Mismatches 4; Indels 0; Gaps 0 
Qy 1 GAYGGGGAGCAGTGTAACGATGGGCAGAACAAAGATGACCACCATGACGACCACCACGAT 60 

Ihll 11 MMMII IIMIIIIIIIIIIIIMIIIIIIilllllllllllMMII 

Do 94 GATGGCGAACAGTGTAATGATGGGCAGAACAAAGATGACCACCATGACGACCACCACGAT 153 

Qy 61 GATCACCATGACGACCATGATGATGATGATGAAACAATGCACTATGCCCAGTGTGAAATG 12 0 

IIMIIMIIMMIMIIIIIIIMIIIMIIIIIMIIIMMMIIIMIIIIIIM 

Db 154 GATCACCATGACGACCATGATGATGATGATGAAACAATGCACTATGCCCAGTGTGAAATG 213 

Qy 121 GAACCAAACCCTCATATGGCTAGCAGCCTTCACCACCATGTCCATGGCAGCATAGAGTTG 180 

IIIIMMIIIIIMIIIIIIIIIIIIIIIMIIIMMIIIIIMIIMIIIIIMIM 

I5t> 214 GAACCAAACCCTCATATGGCTAGCAGCCTTCACCACCATGTCCATGGCAGCATAGAGTTG 273 

Qy 181 TCACAGAAGGGTCATGGAGCTGTTTATCTAGAACTTCATCTTGTCGGATTCAACACAAGT 240 

IIMII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIII 

Db 274 TCACAGCAGGGTCATGGAGCTGTTTATCTAGAACTTCATCTTGTCGGATTCAACACAAGT 333 

Qy. 241 GAAGACCATGACGACCACCATCATGGACTTCATCTGCAC ATGCTTGGTGACATGTCAGCA 300 

I Mill II MM I II I II II I MM II I lirill III 1 1 II III II I II I Ml II Mill ' 

Db 334 GAAGACCATGACGACCACCATCATGGACTTCATCTGCACATGCTTGGTGACATGTCAGCA 3 93 

Qy 301 GGTTGTGATTCTATTGGCGAACTGTACAATGCTCACCCAGAAAAACATGCTGACCCTGGT 360 

MIMIIMIIIIMIMIMIMIIMIMIMIIMIMIIIMMMMIIMIIII 

I^b 394 GGTTGTGATTCTATTGGCGAACTGTACAATGCTCACCCAGAAAAACATGCTGACCCTGGT 453 

Qy 361 GACCTCGGTGACCTGGTTGACGATGATAGGGGCGTGGTTAATGAAGTTCATCATTATGCT 42 0 

IIIIMIIMIIIIIIIIIIIJMIIMIIIIIMMMIIMIIIIMIIIIIMMII 

I^b 454 GACCTCGGTGACCTGGTTGACGATGATAGGGGCGTGGTTAATGAAGTTCATCATTATGCT 513 

Qy 421 TGGTTGGACATTGATGGTACAGCACCAAACACCGAAGCTCTCATTGGACACTCAAT<"'ZirT ^80 

IIIIMMIMIillillllllllllllllllllllllllllllllllllllllllilll 

^^b 514 TGGTTGGACATTGATGGTACAGCACCAAACACCGAAGCTCTCATTGGACACTCAATGACT 573 

Qy 481 ATTTTACAAGGGAGTCACACCGATGCTGATACCCCAGCCAGTAGAATCGCCTGTTGTGTT 54 0 

I I M I M M 1 1 M II II II I II 1 1 M II 1 1 II M II 1 1 M I M M M II II I M II M M 

Db 574 ATTTTACAAGGGAGTCACACCGATGCTGATACCCCAGCCAGTAGAATCGCCTGTTGTGTT 633 

Qy 541 ATTGGTCATGGAAAAGCTCGCCCAGAAACAGCAGCTGCTCTACATCACGAGCTAGAGGAA 600 

IIMMIIIMIMIIIIMIMMIMIIIIIIIIIIMIIIMIIIIIIIIIIIIIII 

Db 634 ATTGGTCATGGAAAAGCTCGCCCAGAAACAGCAGCTGCTCTACATCACGAGCTAGAGGAA 693 

Qy GATAAAACTGAGCATTATGCCCATTGTGACGTAAGATCTAATACACACCAACCAAAGGCT 660 

IMMIIMIMIIIJIMIIIIIIIIIIMMIIIMMMMMIMIMMIIMM 

Db 694 GATAAAACTGAGCATTATGCCCATTGTGACGTAAGATCTAATACACACCAACCAAAGGCT 753 

Qy 661 CTTCATCATCATGTCCACGGAACCATCGATTTCAAACAAGTTGGTTATGGTGACCTTGAA 720 

I M 1 1 1 M II I II M II II II II II II 1 1 II II II II M II II M I II II II II M II II 

^b 754 CTTCATCATCATGTCCACGGAACCATCGATTTCAAACAAGTTGGTTATGGTGACCTTGAA 813 

Qy 721 GTGTCCTACCATTTAGAGGGATTTAATGTAAGTGATGACCACAAAGATCATCTCCATGAC 780 

IMMMMMMMMIMMMIMIMMMMIMMMMMIIMIMMMM 



Db 


814 


Qy 


781 


Db 


874 


Qy 


841 


Db 


934 


Qy 


901 


Db 


994 


Qy 


961 


Db 


1054 


Qy 


1021 


Db 


1114 


Qy 


1081 


Db 


1174 


Qy 


1141 


Db 


1234 


Qy 


1201 


Db 


1294 


Qy 


1261 


Db 


1354 


Qy 


1321 


Db 


1414 


Qy 


1381 


Db 


1474 


Qy 


1441 


Db 


1534 



814 GTGTCCTACCATTTAGAGGGATTTAATGTAAGTGATGACCACAAAGATCATCTCCATGAC 873 
GTACAGATCTACGCCAACGGTGACCTGACCAGTGGATGTGATAACCTCGGTGCTAAATAT 840 

IIIIIIIIMIIIIIIIIIIIMIMMMIIIIIMIIIIIIIIIIIIIIIIIIIMII 

GTACAGATCTACGCCAACGGTGACCTGACCAGTGGATGTGATAACCTCGGTGCTAAATAT 93 3 
GATCCTCATGAAGATTACCACAGTGAGTTGGGTGATCTAGGAGATATTCACGATGATGAC 900 

IIIIIIIIIIIIIIIIMIIIIIIIIillllMIIMIIIIIIIIIIIIIIIIIIIIIII 

GATCCTCATGAAGATTACCACAGTGAGTTGGGTGATCTAGGAGATATTCACGATGATGAC 993 
CATGGCGTTGTCAATGAAAGCCACAGATATTCCTGGATCAATATCTTCGGTGATG ACAGT 960 

IIIIIIIMIIIIIIIMIIMMMMMMIIIIIIMIIMIIIIIIMMIIMII 

CATGGCGTTGTCAATGAAAGCCACAGATATTCCTGGATCAATATCTTCGGTGATGACAGT 1053 
GTCCTGGGACGTTCTATTGCCATTCACCAAAGAGACCATCTTCATAAAAGTGCCAAAATT 1020 

IMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIillllllllllllllllll 

GTCCTGGGACGTTCTATTGCCATTCACCAAAGAGACCATCTTCATAAAAGTGCCAAAATT 1113 
GCCTGTTGTGTCATAGGACGTGGACAGAGCCATCCAGAAATTGTTCACAGAGCTAAATGT 1080 

IIMIIIIMIMIIMIIIIIIIIIIIIMIIMIIIIIIilllllMIIIIIIIIIII 

GCCTGTTGTGTCATAGGACGTGGACAGAGCCATCCAGAAATTGTTCACAGAGCTAAATGT 1173 
GTTGTCAGACCTAATACAGAATCTACTGGTTTACATCACCATGTCTCTGGTTCTATAACA 1140 

IIIIMMIMIMIIIIIillllMIIIMMIIM.IIIIMIIIMIIMIIIIIIII 

GTTGTCAGACCTAATACAGAATCTACTGGTTTACATCACCATGTCTCTGGTTCTATAACA 1233 
TTCGAACAGACCCCTGGAGGATCAACACATATGACGGCTGATCTCAAAGGATTTAACGTT 1200 

lUIIIIIIIIIIIIIMIIIIIIIIIMilllllllllllllllllllllMIIIIIII 

TTCGAACAGACCCCTGGAGGATCAACACATATGACGGCTGATCTCAAAGGATTTAACGTT 1293 
AGTGAGGACTTGTCACATCATCGTCATGGTGTGCAGCTCCATGAATGGGGAGATATGTCC 1260 

MIIIMIMMIIIMIIMMIMIIilMIMIillMMIMIMIMMMIMI 

AGTGAGGACTTGTC ACATCATCGTCATGGTGTGCAGCTCCATGAATGGGGAGATATGTCC 1353 
CATGGCTGTCACTCCTTAGGCAGAATGTACCATGGTCATGATGATGCTCATGACCCCAAA 1320 

mill llllllllil.|.|llll!l.!-.UII.||.| III M I ■! 14 14 1 111 I lUi^ 

CATGGCTGTCACTCCTTAGGCAGAATGTACCATGGTCATGATGATGCTCATGACCCCAAA 1413 
AGACCTGGTGACCTTGGTGATGTTATAGATGATTCCCATGGCATCGTTCATTCAACTAGA 13 80 

IIIIIMIMMIIIIMIIIIIIIIMIIIIIIIIIIIillllllllllllllllllll 

AGACCTGGTGACCTTGGTGATGTTATAGATGATTCCCATGGCATCGTTCATTCAACTAGA 1473 
ACCTTTGATCATCTTAATGTTGAAGATCTTAACGCACGTTCCCTTGTGATTATGCAGGGC 1440 

IIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIII 

ACCTTTGATCATCTTAATGTTGAAGATCTTAACGCACGTTCCCTTGTGATTATGCAGGGC 1533 
GGAC ATGAGGTCGAG AGTG AGAGGGTTGCTTGCTGTGTTATAGGACGGGCA 14 91 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIII 

GG AC ATGAGGTCG AGAGTG AG AGGGTTGCTTGCTGTGTTATAGG ACGGGC A 1584 



Database" : 



N_Geneseq_23Sep04 : * 



1 : geneseqnl980s : * 

2 : geneseqnl990s : * 

3 : geneseqn2000s : * 

4: geneseqn2001as ; * 

5: geneseqn2001bs : * 

6 : geneseqn2002as : * 

7 : genesec[n2002bs : * 

8: geneseqn2003as ; * 

9: geneseqn2003bs : * 

10 : geneseqn2003cs : * 

11 : geneseqn2 003ds : * 

12: geneseqn2004s;* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or ec[ual to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result 
No. 



% 

Query 

Score Match Length DB ID 



Description 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 



1490.6 
1490.6 



128.4 
56.8 
54.4 
52.8 
52 .6 
52.6 
52.6 
52.6 



100.0 
100.0 



8.6 
3.8 
3.6 
3.5 
3.5 
3.5 
3.5 
3.5 



110000 



1491 
1611 
606 
1083 
2000 



583 
583 
583 
583 



4 AAI23356 

4 ABA68463 

4 AAI48680 

4 ABA50512 



3 AAA47150 

3 AAA47151 

8 AAD48291 

5 AAS76745 

8 ADA71938 



12 AD034927 1 



Aaa47150 DNA encod 
Aaa47151 DNA encod 
Aad48291 Crassostr 
Aas76745 DNA encod 
Ada7193 8 Rice gene 

Continuation (2 of 
Aai23356 Probe #13 
Aba684 63 Human foe 
Aai48680 Probe #17 
Aba50512 Human bre 



Database : Issued_Patents_NA: * 

1 : /cgn2_6/ptodata/l/ina/5A_COMB . seq: * 

2 : /cgn2_6/ptodata/l/ina/5B_COMB.seq: * 

3 : /cgn2_6/ptodata/l/ina/6A_COMB . seq: * 

4 : /cgn2_6/ptodata/l/ina/6B_COMB. seq: * 

5 : /cgn2_6/ptodata/l/ina/PCTUS_COMB . seq : * 

6 : /cgn2_6/ptodata/l/ina/backf ilesl . seq : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


54.4 


3 


.6 


480 


• 4 


US-09-248-796A-6301 


Sequence 


6301, Ap 


2 


51.2 


3 


.4 


291 


4 


US-09-248-796A-6300 


Sequence 


6300, Ap 


3 


46.4 


3 


.1 


549 


4 


US-09-248-796A-3913 


Sequence 


3913, Ap 


4 


44.8 


3 


.0 


5340 


4 


US-09-627-122-21 


Sequence 


21, Appl 


5 


42 .4 


2 


.8 


2518 


3 


US-09-433-699-3 


Sequence 


3, Appli 


6 


42 .4 


2 


.8 


10304 


4 


US-09-627-465B-1 


Sequence 


1, Appli 


7 


42 


2 


.8 


496 


1 


US-08-263-413-23 


Sequence 


23, Appl 


8 


42 


2 


.8 


500 


1 


US-08-263-413-22 


Sequence 


22, Appl 


9 


42 


2 


.8 


675 


1 


US-07-807-043B-2 


Sequence 


2, Appli 


10 


42 


2 


.8 


675 


1 


US-08-299-849B-2 


Sequence 


2, Appli 


11 


42 


2 


.8 


675 


2 


US-08-142-368A-2 


Secjuence 


2, Appli 


12 


42 


2 


.8 


675 


3 


US-08-967-727-2 


Sequence 


2, Appli 


13 


42 


2 


.8 


675 


3 


US-08-037-230D-2 


Sequence 


2, Appli 


14 


42 


2 


.8 


675 


4 


US-09-583-850-2 


Sequence 


2, Appli 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUh4MARIES 

% 

Result Query 





No. 


Score 


Match 


Length DB 




Description 




1 


52 . 6 


3 


.5 


583 


9 




Sequence 20772, A 




2 


52 . 6 


3 


.5 


1959 


9 




Sequence 4012, Ap 


c 


3 


49 . 2 


3 


.3 


327 


9 


Uo-Ui?-oo4- /bl-2o059 


Sequence 28059, A 




4 


49 


3 


.3 


676 


17 


T7C_1A A'XI Of3 A A ^ •\ 

Uc>-iO-4J /-9oJ-44d31 


Sequence 44631, A 


c 


5 


48.2 


3 


.2 


744802 


15 


US- 10 -292 -7 9ft - T 7 fi9 


sequence uoy, Ap 




6 


47.8 


3 


.2 


1168 


15 


US-10-017-161-2179 


Sequence 2179, Ap 




7 


47.8 


3 


.2 


1168 


15 


US-10-292-798-1825 


Sequence 1825, Ap 




8 


47 


3 


.2 


1631 


15 


US-10-369-493-3645B 


Sequence 36458, A 




9 


46.8 


3, 


.1 


728 


17 


US-10-767-795-5840 


Sequence 5840, Ap 




10 


46.6 


3. 


.1 


717 


18 


US- 10-425-115-15020 


Sequence 15020, A 


c 


11 


46.2 


3. 


.1 


456 


9 


US-09-864-761-11468 


Sequence 11468, A 




12 


45.8 


3, 


.1 


493 


17 


US-10-767-701-31233 


Sequence 31233, A 


c 


13 


45.8 


3. 


.1 


785 


15 


US-10-029-386-22627 


Sequence 22627, A 


c 


14 


45 


3. 


,0 


506 


15 


US- 10 -02 9 -386- 2 0619 


Sequence 20619, A 


c 


15 


44.8 


3 . 


0 


58985 


10 


US-09-901-152-3 


Sequence 3, Appli 


c 


16 


44.8 


3. 


0 


143601. 


10 


US-09-855-B24-3 


Sequence 3 , Appli 




17 


44.4 


3. 


0 


1028 


18 


US-10-739-930-4488 


Sequence 4488, Ap 




18 


44 . 2 


3 . 


0 


574 


9 


US-09-864-761-228 


Sequence 228, App 




19 


44.2 


3. 


0 


669 


9 


US-09-B64-761-17051 


Sequence 17051, A 




20 


44.2 


3 . 


0 


926 


18 


US-10-425-115-54567 


Sequence 54567, A 



